Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

makaveli10 · 2025-08-19T14:25:15Z

The PR adds:

LoRA finetuning support for both training a new adapter or finetuning an existing adapter. And saved the adapter at the end of the training run to be used as required for inference.
cuda: OUT_PROD Q8/Q4 for quantised lora finetuning.
vulkan: Added OUT_PROD operator for fp32 to enable finetuning. Added OUT_PROD Q8, Q4 to enable quantised finetuning.
vulkan: Added cross-entropy-loss-backward to allow lower context size which is critical for training on mobile device due to memory constraint.

…a is provided

Signed-off-by: vineet <[email protected]>

…lation Signed-off-by: vineet <[email protected]>

zoq · 2025-08-19T16:04:58Z

Steps to test llama.cpp inference on Android:

Install Termux from the PlayStore and open it.
Run apt update
Run apt remove vulkan-loader-generic
Run apt install git cmake vulkan-tools vulkan-headers shaderc vulkan-loader-android
Run vulkaninfo --summary: This should show the driver and gpu information. If it's the stock driver, it shouldn't mention Mesa.
git clone the repo inside termux and cd into it.

git clone https://github.com/makaveli10/qvac-ext-lib-llama.cpp.git
git checkout lora-finetuning

make sure to checkout the lora-finetuning branch
7. Configure the vulkan backend build with cmake -B build -DGGML_VULKAN=1
8. Build it with cmake --build build --config Debug -j2
9. Run termux-setup-storage and give storage permissions to termux.
10. Outside termux, download a model on the phone, click on it and select to open it with termux. Download the model from here: https://huggingface.co/prithivMLmods/Qwen3-0.6B-GGUF/tree/main i.e. download Qwen3_0.6B.Q8_0.gguf
11. Click "Open Directory" on the prompt.
12. The model should now be reachable inside termux in the ~/downloads directory.
13. For finetunine 8 bit Qwen:

./build/bin/llama-finetune-lora -m Qwen3_0.6B.Q8_0.gguf -f trump.txt -c 256 -b 256 -ub 256 -ngl 999

trump.txt dataset: https://github.com/user-attachments/files/21859494/trump.txt

zoq · 2025-08-19T16:05:42Z

For testing I'll reference the updated README: https://github.com/tetherto/qvac-ext-lib-llama.cpp/blob/bc7dd9f9288222394da37eac3d7adf71d409ad83/examples/training/README.md#using-trained-adapters

zoq · 2025-08-19T16:11:49Z

./build/bin/llama-cli -m Qwen3_0.6B.Q8_0.gguf --lora trained-lora-adapter.gguf -if -p "What is your favorite pokemon?" -ngl 999

command we used for testing

Signed-off-by: vineet <[email protected]>

andrunko

Changes LGTM in general, just some small comments/nits overall, feel free to ignore the nitpicks :).

andrunko · 2025-08-21T21:34:06Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

        device->device.destroyBuffer(buffer);
+        device->device.freeMemory(device_memory);


The change looks good but the commit msg should be updated to remove wip - would also be good to explain what specific crash this fixes in the commit msg.

andrunko · 2025-08-21T21:42:02Z

examples/training/finetune.cpp

@@ -93,4 +93,4 @@ int main(int argc, char ** argv) {
    llama_backend_free();

    return 0;
-}
+}


nit: nothing changed, I'd drop it from the commit.

andrunko · 2025-08-21T21:43:55Z

ggml/src/ggml-cuda/ggml-cuda.cu

@@ -3202,7 +3202,8 @@ static bool ggml_backend_cuda_device_supports_op(ggml_backend_dev_t dev, const g
                }
            } break;
        case GGML_OP_OUT_PROD:
-            return op->type == GGML_TYPE_F32 && op->src[0]->type == GGML_TYPE_F32 && op->src[1]->type == GGML_TYPE_F32;
+            // return op->type == GGML_TYPE_F32 && op->src[0]->type == GGML_TYPE_F32 && op->src[1]->type == GGML_TYPE_F32;


Any reason to keep this prev code? We can always check git history if we need to revert.

andrunko · 2025-08-21T21:45:42Z

ggml/src/ggml-cuda/out-prod.cu

-    const float * src0_d = (const float *) src0->data;
-    const float * src1_d = (const float *) src1->data;
+    // const float * src0_d = (const float *) src0->data;
+    // const float * src1_d = (const float *) src1->data;


Same here and in other places, I would drop the old code in general.

andrunko · 2025-08-21T21:46:40Z

ggml/src/ggml-cuda/out-prod.cu

+
+    if (allocated_src0) {
+        CUDA_CHECK(cudaFreeAsync(src0_f32, stream));
+        // printf("DEBUG: Freed dequantized src0 buffer\n");


nit: while here I would also remove these leftover debugs - here and in other similar places.

andrunko · 2025-08-21T21:48:01Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

        case GGML_OP_ADD:
        case GGML_OP_SUB:
        case GGML_OP_MUL:
        case GGML_OP_DIV:
-            return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&
+	    return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) &&


nit: spurious change?

andrunko · 2025-08-21T21:55:45Z

Looks like there are some CI failures also related to these changes - see https://github.com/tetherto/qvac-ext-lib-llama.cpp/actions/runs/17076253696/job/48418341198?pr=5 for example:

/__w/qvac-ext-lib-llama.cpp/qvac-ext-lib-llama.cpp/src/llama-lora-training.cpp:293:29: error: the address of 'ggml_tensor::name' will never be NULL [-Werror=address]
  293 |     if (!tensor || !tensor->name) {
      |                     ~~~~~~~~^~~~

JamieBohannaWebDev · 2025-08-22T07:55:48Z

Fine Tuning attempt on Pixel 9 Pro Fold Evidence below.

Please note the 27.5 hours estimated completion time...

makaveli10 and others added 15 commits August 19, 2025 10:07

Add lora finetuning from adapter

f7b0025

Add: create new lora adapter for target modules to finetune if no lor…

116f3dd

…a is provided

Fix identical loss over epochs; fix garbage lora initization

9e6d8ce

Signed-off-by: vineet <[email protected]>

Remove lora training from finetune.cpp

8bb11c0

Signed-off-by: vineet <[email protected]>

Add adapter saving & other lora target modules

486ebc1

Signed-off-by: vineet <[email protected]>

Add finetune-lora for lora finetuning in examples

c23ada9

Signed-off-by: vineet <[email protected]>

Add dequantization to out_prod cuda kernel

3f295e1

Signed-off-by: vineet <[email protected]>

Update README with finetune-lora

0c1ffd1

Signed-off-by: vineet <[email protected]>

Vulkan: add support for fp32 OUT_PROD op

e9f5d88

CPU: add support for fp16_fp32 OUT_PROD op

fb0e501

Vulkan: add support for f16_f32 OUT_PROD op

2b0c835

Vulkan: Add Q4_0/Q8_0 OUT_PROD Vulkan support

0aef6c8

vulkan: Add initial cross entropy loss backward shader

25c5316

Signed-off-by: vineet <[email protected]>

vulkan: Fix cross-entropy-loss-back dispatch size and wg denominator

0721550

Signed-off-by: vineet <[email protected]>

vulkan: Change uint32 cast to int32 for outprod; allows android compi…

bc7dd9f

…lation Signed-off-by: vineet <[email protected]>

Italo Nicola and others added 2 commits August 19, 2025 12:19

wip vulkan crash fix

b820541

vulkan: Set out_prod pipeline disable_robustness to true

25dfd75

Signed-off-by: vineet <[email protected]>

andrunko reviewed Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

Uh oh!

andrunko left a comment

Uh oh!

andrunko Aug 21, 2025

Uh oh!

andrunko Aug 21, 2025

Uh oh!

andrunko Aug 21, 2025

Uh oh!

andrunko Aug 21, 2025

Uh oh!

andrunko Aug 21, 2025

Uh oh!

andrunko Aug 21, 2025 •

edited

Loading

Uh oh!

andrunko commented Aug 21, 2025 •

edited

Loading

Uh oh!

JamieBohannaWebDev commented Aug 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

		device->device.destroyBuffer(buffer);
		device->device.freeMemory(device_memory);

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Are you sure you want to change the base?

Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5

Conversation

makaveli10 commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoq commented Aug 19, 2025

Uh oh!

zoq commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrunko left a comment

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

andrunko Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrunko commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamieBohannaWebDev commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zoq commented Aug 19, 2025 •

edited

Loading

zoq commented Aug 19, 2025 •

edited

Loading

andrunko Aug 21, 2025 •

edited

Loading

andrunko commented Aug 21, 2025 •

edited

Loading

JamieBohannaWebDev commented Aug 22, 2025 •

edited

Loading